Segmental quantization of speech spectral information

نویسنده

  • Torbjørn Svendsen
چکیده

The majority of current speech coding algorithms for medium-to-low bit rates transmit two information components, a short-time specuum estimate aud an excitation signal. Even though advanced intmframe quantization schemes have been proposed, the spectml informationstillconsumesalargeproportionoftheavailable bit rate. For many speech sounds, the speech speurum is relatively smooth for time intervals much longer than the sampling rate of the spectrum estimates. Thus, compression can be obtained by identifying smoothly varying segments ofthespeechspecuumaudoulytmnsmitthespecual iuformation once for each segment. The segment spectml information is then an approximation to the true speurum, but if the segmeutatiou criterion is properly chosen, the induced distortion cau be controlled to be within the acceptable MB mean specual distortion limit. In the present paper we show that segment quautization can be applied to reducetherequimdbitrateforthespectmlinformationbya factor of approximately two without compromising the total spectml distortion. INTRODUC’IION The transmission of the speech spectml information consumes a major part of the available bit rate in current medium-to-low bit rate speech coders. Several researchers have made important contributions to the design of efficient quantization schemes for the spectml information, thereby reducing the previously lower bound for acceptable quantization (i.e. achieving a meau spectml distortion of 1 dB) from 40 bits per K&coefficient spectml vector first to 32 bits[l] and then to 24 bits&ctor[Z]. Although the above achievements have resulted iu a significant bit rate reduction, they have only exploited intrafmme properties of the speech spectrum. The bit rate required for proper transmission of the speural information isdependentupoatheframerateaswellasthenlrmbetof bits used to quantize each individual frame. The frame rate has traditionally been determined by the frequency of performing the spectml analysis, i.e., the sampling rate (in time) of the timefrequency pattern. This sampling rate has been chosen such as to obtain a compromise between conflicting interests. The rate should be high enough to capturethespectmltmnsitionsthatareimportanttothe perceptualqualityofthecodedspeechandatthesametime be low enough to give a reasonable bit rate. The speech spectmm is mostly slowly varying, in some cases it cau be consideredasbeingstationaryoverasmuchasafewhundmd ms. However, the spectral transitions between phonemes (and in some cases also within phonemes, e.g. in plosives) canberapidinsomecasesontheorderof3-5ms.Thetypical compromise taken in speech coders is to estimate and transmit the speech specuum every lo-25 ms irrespective of the current spectml variation. Because of the necessary compromise when selectiug a faed sampling rate of the specual information, there is a siwt fxmWi0x1 between successive spectml estimates. T&is cau be exploited iu inter-frame differential coding schemes (see e.g. [3]) to reduce the bit rate. Another appmach to exploiting the inter-fmme correlation is taken in segment (or matrix) quantization schemes [4]. In thiscasetbespeechissegmentedintovariableorf~~length segments. If the segment length is variable, some segmentation criterion needs to be applied. Each speech segment is quantized as a single entity by codebook look-up. The codebook entries am matrices consisting of a spectml vector sequence. For 6xed length segments, this is a straightfmard extension to vector quautization. If variable length segments are USed an interpolation scheme is necessary to align the input with the codebook entries. The matrix quantization approach is well suited for very low bit rate applkations. For applications where the requirements to the spectml distortion caused by the quautizer is higher, the codebook size will be prohibitive for real time -0ns. The use of variable length segments is appeahng as this makes feasible the utilization of the varying station&y duration of the various speech sounds. When the length of the segments used is dependent upon the rapidity of the tqkxml variation, steady state vowels will produce longer segments thau e.g. plosives. These quasi-stationary segments can be efficiently represented by some simple mathematical approximation, which is much more computationally efficient than the ma&ix quantization approach. A number of different strategies for obtaining a segmentation proper foriow bitrate uzumission of the spectml information have been proposed in the literature. A linear intezpolation approach is taken in [5]. Here, starting with a segment length of 2 frames, the first and last frame in a segment is taken as reference vectors. An approximation 0ftheLARspeuralvectors

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Novel low-band phase representation for low bit-rate speech coding

Vector Quantization (VQ) has been extensively used in speech vocoders. Phase information is often ignored or coarsely represented in parametric coders because of the difficulties facing phase quantization. This paper introduces a novel distortion measure for the low-band speech signal that takes phase information into consideration, with no increase in the bit-rate. This measure has been used i...

متن کامل

Using Exciting and Spectral Envelope Information and Matrix Quantization for Improvement of the Speaker Verification Systems

Speaker verification from talking a few words of sentences has many applications. Many methods as DTW, HMM, VQ and MQ can be used for speaker verification. We applied MQ for its precise, reliable and robust performance with computational simplicity. We also used pitch frequency and log gain contour for further improvement of the system performance.

متن کامل

Using Exciting and Spectral Envelope Information and Matrix Quantization for Improvement of the Speaker Verification Systems

Speaker verification from talking a few words of sentences has many applications. Many methods as DTW, HMM, VQ and MQ can be used for speaker verification. We applied MQ for its precise, reliable and robust performance with computational simplicity. We also used pitch frequency and log gain contour for further improvement of the system performance.

متن کامل

Efficient vector quantization of LPC parameters at 24 bits/frame

Abstruct-Linear predictive coding (LPC) parameters are widely used in various speech processing applications for representing the spectral envelope information of speech. For low bit rate speech-coding applications, it is important to quantize these parameters accurately using as few bits as possible. Though the vector quantizers are more efficient than the scalar quantizers, their use for accu...

متن کامل

Speech coding using mixture of gaussians polynomial model

We have investigated a novel method of spectral estimation based on mixture of Gaussians in a sinusoidal analysis and synthesis framework. After quantisation of this parametric scheme a xed frame-rate coder operating at a bit-rate of around 2.4 kbits/s has been developed. This paper describes an extension to this spectral model based on constraining the parameters of the mixture of Gaussians to...

متن کامل

Enhancement of hearing-impaired Mandarin speech

This paper presents a new voice conversion system that modifies misarticulations and prosodic deviations of the hearingimpaired Mandarin speech. The basic strategy is the detection and exploitation of characteristic features that distinguish the impaired speech from the normal speech at segmental and prosodic levels. For spectral conversion, cepstral coefficients were characterized under the fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994